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Abstract — The paper presents Entropy Semiring Forward- 
backward algorithm (ESRFB) and its application for memory 
efficient computation of the subsequence constrained entropy and 
state sequence entropy of a Hidden Markov Model (HMM) when 
an observation sequence is given. ESRFB is based on forward- 
backward recursion over the entropy semiring, having the lower 
memory requirement than the algorithm developed by Mann 
and MacCallum, with the same time complexity. Furthermore, 
when it is used with forward pass only, it is applicable for the 
computation of HMM entropy for a given observation sequence, 
with the same time and memory complexity as the previously 
developed algorithm by Hernando et al. 



I. Introduction 

Hidden Markov Models (HMMs) are standard probabilis- 
tic models for state sequences in sequential data labeling 
ifTUl . Subsequence constrained entropy of HMM explaining 
an observation sequence and state sequence entropy, are useful 
quantities which provide a measure of HMM uncertainty. One 
criterion for the estimation of the HMM quality is the entropy 
of state sequence explaining an observation sequence, which 
provides a measure of its uncertainty 0. 

The algorithms for HMMs mostly consider efficient 
marginalization which is usually performed using the forward- 
backward algorithm ([8]), which runs in 0(N 2 T) time, where 
N denotes the number of states and T is the length of 
sequence. Recently, Mann and MacCllum have developed an 
algorithm for computation of HMM subsequence constrained 
entropy for similar probabilistic model conditional random 
fields (CRF), which is based on the marginal probabilities 
computation J5) with the same asymptotical complexity as 
FB. This algorithm can be adapted to work with HMMs, 
but when the sequence length is large it becomes memory 
demanding, since it needs 0(NT) memory. On the other hand, 
Hernando et al. [6] developed the memory efficient algorithm 
for state sequence entropy computation which requires 0(N) 
memory. The algorithm has the same time complexity as FB, 
but it is not applicable for the computation of subsequence 
constrained entropy. 

In this paper we develop a new algorithm which can be 
used for both types of computations. The algorithm is based 
on forward-backward recursion over the entropy semiring [7 1 
and is called Entropy Semiring Forward-backward algorithm 
(ESRFB). ESRFB has lower memory requirement than Mann- 
MacCallum's algorithm subsequence constrained entropy com- 
putation. Furthermore, when it is used with the forward pass 
only it can compute the entropy in the same time and space 
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as Hernando et al.'s algorithm. Moreover, it is shown how the 
Hernando et al.'s algorithm can be derived from ESRFB. 

The paper is organized as follows. In section II we define 
the HMM and present the forward-backward algorithm (FB) 
for efficient marginalization of HMM. Section III reviews the 
algorithms by Hernando et. al. and Mann and McCallum, 
for efficient computation of HMM entropy and subsequence 
constrained entropy for a given observation sequence. Section 
IV gives the general FB algorithm which operates over the 
commutative semiring. Finally, section V considers the FB 
over the entropy semiring and its application to HMM entropy 
computation. 

II. Hidden Markov Models and forward-backward 

ALGORITHM 
In this paper, we adopt the following notation: 

> The sequence 1,1 + 1, ... ,r is shortly denoted with I : r, 
and the sequence : I, r : T is denoted with —l:r 

• Big letters are used for random variables (St, Ot) and the 
small ones for their realizations (st,Ot). 

• The sequence of symbols is (Si, . . . S r ) is denoted with 
Si :r , the sequence (Sq, . . . , Si, S r , ■ ■ ■ , St) with 5_; :r 
and similarly for s; :r , Ot r and ot r for < I < r < T. 

m The sequences Sq : t, Oq-t, so:T and oq : t are denoted 
with S, O, s and o, respectively. 

> The variables are omitted in probability notation. Thus, 
p(s t ,oi:t) stands for P(S t = s t ,0 1:t = out), p(o) for 
P(0 = o) and so on. 

Hidden Markov model (HMM) consists of the following 
elements: 

• A Markov chain (So, . . . , St), represented by an N x N 
stochastic matrix A, which describes the transition prob- 
abilities ciij = P(St — j\St-i = i) between the N states 
of the model, together with a probability distribution hi, 
where Tii = P(Sq = i). 

« A set of probability distributions, one for each hidden 
state, bi(ot) = P(Ot — ot\St — i), which model the 
emission of such observations. If there are M possible 
distinct observations, we accommodate the probability 
distributions to be in the rows of an N x M matrix B. 
With these settings, the joint probability that state sequence 
S takes value s and the observation sequence O takes value 
o is given with: 

T 

p(s,o) = Tr So b So (oo)Y[a St _ 1 s t bs t (°t)- (1) 
t=i 

Using the probability conditions J2s t a st-i-s t — 1 an d 
J2o t &st (°t) = 1> we can derive two important equations which 
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characterizes HMM and will be used in the rest of the paper: Two most commonly used marginals p(s t ~i:t\o) and p(s t \o) 

i can be computed as follows 

p(so :i ,o 0:i )=7t so b so (o )l[a St _ lSt b St (o t ), (2) d t -i(s t _i)a St _ lSt 6 St (o t )A(o t ) 

p{St-l:t\0) = , (13) 

Ct 



t=l 
T 



p(s i+ i: T ,o i+1:T \s t ) = J] a St _ lSt 6 St (o t ). (3) p(s t |o)=a t (s t )-ft( St ). (14) 



t=i+l 



The majority of computations are performed in the forward 
One of the main problems in HMMs is efficient marginal- and backward recursion phases, which results in the time 
ization of computation of the HMM conditional probability complexity 0(N 2 T). The storing of all forward and backward 
p(s\o): vectors along with the normalization factors requires 0(NT) 



p{si:r \o)=J2p(s\o)=J2^4- (4) 



memory. 



III. Entropy computation of Hidden Markov 

The computation of (0 by enumerating all the s G 5 T+1 MODELS 

requires about TN T+1 additions and multiplications, which The conditional entropy of HMM is given with 

would be infeasible even for small values of N and T (for ^-^ 

N = 10 and T = 20, the total number of operations H ( S ' °> = ~ ' °) lo ^ s I °)- < 15 > 

has an order 10 22 ). A more efficient way is the forward- 

backward{FB) algorithm which solves the problems by use while < the subsequence constrained entropy is 
of 0(N 2 T) operations. In this paper we present a numerical 

stable variant of FB. For another variants see (To), 0, H H{S- hr \s tr , o 0:T ) = 

The forward-backward algorithm recursively computes de- — p(s^i- r \si :r , oq-.t) ■ logp(s_; :r |s; :r , o^-.t)- (16) 



sired quantities using the HMM forward and backward prob 
abilities: 



S-!:r 

If we introduce 

d t (s t ) = p(s t \o 1:t ), 0t(8 t )= " < 0Qt y ( 5 ) H{S^: r7Sl .. r \o) = -J2p(s\°)-logp(s\o), (17) 



as follows. 

1 ) Forward initialization: For 1 < j < N: 



we can derive the following equality 



ttI a I „\ H{S-l : r, S tr o) + \ogp(s tr O 
N , / s H{b-l :r \si :r ,0) = — (18) 

E, , s » / -x Kj b j{O0) P{Sl:r\0) 

j=i c ° A direct evaluation of ( fT3T > is infeasible as there are N T 

tie, j • C n^ + ^T^i^-^^r terms. In the following text we consider efficient algorithms 

2) Forward recursion: For 0<t<l,l<j<N: 6 6 

for the entropy computation. 
. First, in the next two subsections, we review two algorithms 

c * = at t -i{i)aijbj(ot), (7) based Qn ^ entropy decomposition rules 

> n ^kiWggW ™ + (i9) 

* w q ' w ff(y|x) = ^K^)- J H"(m = ^)- (20) 

3) Backward initialization: For 1 < i < A: 

After that, in the next section we derive the new algorithms 
= 1, (9) based on the ESRFB algorithm. 

4) Backward recursion: For T — 1 > i > 0, 1 < i,j < N: 



A. The algorithm by Mann and McCallum 



2^7=1 a ijbj(°t+i)f3t+i{j) 

/3t (i) = — . (10) Mann and McCallum proposed the algorithm for the linear 

c * +1 chain conditional random fields entropy gradient computation 

The normalization factors c t ensure that the probabilities [9|, which can also be used for HMMs. The algorithm uses 

sums to one and represents the conditional observational the conditional probabilities 

probabilities: , , 

/ • I -\ / i \ P(St:t+l\0) 

i s r i s n ,, Pt\t+Mn = p{st\s t+ i,o) = — - — — , (2i) 

C =p(O ), C t =p{0t\0 Q :t-l). (11) p{S t+ l\o) 

Once the forward and backward probabilities are computed Pt|t-i(*|j) = p( s t\st-i,o) — P( St ~ 1: *l°) ; (22) 

we can compute the marginal as P\ s t~i\°) 

,, which, in turn, are computed using the FB algorithm and 

p(si- r \o) — ai(si) ■ Y\ CLst - 1 ' St St ^ 0t > j3 r ( y s r ) (12) me forward and backward entropies, which, in turn, are 

f=;+1 c * computed with the recursive procedure based on the entropy 
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decomposition formulas d!9t , The forward entropy H"(st) at 
time t is defined as the entropy of state sequence So:t-l which 
ends in St, for a given observation sequence o: 

H?(s t )=H(S :t-i\st,o). (23) 

while the backward entropy _fff (s t ) at time t is the entropy 
of state sequence St+i-.T which starts in s t : 

Hf(s t )=H(S t+1 ., T \s u o). (24) 

Using the forward and backward entropies, subsequence con- 
strained entropy conditional HMM entropy can be recursively 
computed as in the following algorithm. 

1) Forward backward algorithm: Compute and store for- 
ward and backward probabilities using FB algorithm. 

2) Forward entropy initialization: For 1 < j < N: 

Hg(j) = Q; (25) 

3) Forward entropy recursion: for < t < T — 1, 1 < 

i,j<N: 

N 

flt+i(7') = Eft|*+i( < W( fl?(i) " lo SPtl*+i(*U))> (26) 

i=l 

where Pt\t+i(i\j) is computed using (fJTJ, dl3t and < fT~4T >. 

4) Backward entropy initialization: For 1 < j < N: 

H$.{j) = 0; (27) 

5) Backward entropy recursion: for < t < T — 1, 1 < 

i,j < N: 



B. The algorithm by Hernando et al. 

In (6), Herando et al. develop the recursive algorithm for the 
computation of Hidden Markov model entropy. It uses HMM 
forward probability 

6t t {s t ) =p(s t \oi:t), (32) 
conditional probability 

Pt\t-i(s t \s t -i) =p(st-i\s t ,out), (33) 
and intermediate entropy 

H t (s t )=H(S 0:t -i\s t ,o 1:t ). (34) 
HMM entropy is computed as follows. 

1 ) Initialization: For 1 < j < N set: 

ffo(i) = Q, 



aoh) = ^rr^ 



£ i= l7T,j6j(0l) 

2) Induction: For 1 < t < T and 1 < j < N set: 



(35) 
(36) 



tff-i(j) 



T 

E 

i=l 



Pt|t-1 



(t|j)(ftf(0-]ogft| t -i(*|j) 



(28) 



where Pt|t+i(*U) is computed using (l22l . (fl3T > and (fl4l . 

6) Termination: 

H(S-i :r , Su r \6) 



&f d)= (37) 

Lfe=i £i=i a t _i^)a. ifc 6 fc (o t ) 

ft-i|t(*|j) = ^jv ^jv : ' ■ ( 38 ) 



-t:r j <->/:r |^ — TV 

:p(a Iir |o)(flf(«,) + ^(«r)+logp(*J ! r|o)), (29) fft(7)=£ft-i|t(*li)(ift-i(i)-Iogft-x|«(i|i))- ( 39 ) 



H(S-Ur\si:r,o) = 



H(S-l-. r , Sl-. r \o) + \ogp(si :r \o) 



(30) 



p(si:r\o) 

The time complexity of algorithm is 0{N 2 T + N r ~ l ), 
where 0(N 2 T) is for the forward-backward entropy compu- 
tation and 0(N r ~ l ) for the termination phase. The memory 
complexity depends on the sequence length since all forward 
and backward vectors should be available in forward and 
backward entropy recursion phases; regarding 0(N r ~ l ) space 
required for storing the results in the termination phase, the 
total memory complexity is 0(NT + N r ~ l ). 

The algorithm can also be used for the computation of 
entropy using the equality 



i=i 

3) Termination: 



H(S\o) = H(S T \o) 



8Q:T — 1 



p(s T \o)-H ( T a \ ST ), (31) 



which follows from the entropy decomposition formulas and 
definition of forward entropy. In this case, the backward en- 
tropy pass is not needed, but the time and memory complexity 
are not reduced, since the forward and backward probabilities 
still need to be computed. In the following subsection we 
review the algorithm developed in |6| by Hernando et al., 
which computes the entropy with the memory complexity 
independent of the sequence length. 



JV 

H(S | o)=J2 &t(3)(HtU) - log&rtf)) • (40) 

3=1 

The algorithm runs with the linear time complexity 0(N 2 T) 
and in fixed memory space independent of sequence length, 
0(N 2 ), since the vectors &t-i, H t ~i and the matrix Pt-i\t 
should be computed only once in t — 1-th iteration and, after 
having been used for the computation of H t , they can be 
deleted. 



IV. The forward-backward over the commutative 

SEMIRING 

The FB algorithm for HMMs works for more general 
models in which the factors in (JTJ are not probabilities but 
the functions whose range is a commutative semiring (T). 
In this section we present the forward-backward over the 
commutative semiring and derive the FB for HMMs as a 
special case. 
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A. The forward-backward algorithm over a commutative 
semiring 

We begin with the definition of the commutative semiring. 

Definition 1: A commutative semiring is a set K with 
operations © and ® such that both © and © are commutative 
and associative and have identity elements in K (0 and 1 
respectively), and © is distributive over ffi. 

Let s = {sq, . . . , st} be a set of variables taking values 
from the set S = {1, ...,N}. We define the local kernel 
functions, u : S -4 K, u t : <S 2 — > K for i = 1,...,T, 
and the global kernel function u : S T+1 — > K, assuming that 
the following factorization holds 



u(s) = u (s ) <S) (X) u t (a t -i,8t) 



(41) 



for all s = (so, . . . , st) 

The FB algorithm solves two problems 

1) The marginalization problem: Compute the sum 

Va:b(Sa:b) = u(«), (42) 

2) The normalization problem: Compute the sum 



z = 0«(«). 



(43) 



Similarly as in HMM, the recursively computes the 
forward variable 

i 

«i(»i)= «o(»o)®ut(*t-i,«t), (44) 

S :i-1 t=l 

which is initialized to 

a (s ) = u (s ), (45) 
and recursively computed using 

oti(si) = 0u i _i(s i _i,s i ) <8) ai_i(si_i), (46) 

S«-l 

and the backward variable 

T 

ft(si) =00 «t(«t-i,»t), (47) 

«i + l:T t = l+l 

which is recursively computed using 

Pt( s t) = 0Ui+i(st,st+i) OA+iCst+i), (48) 



St + l 



and initialized to 



^t(st) = 1. 



(49) 



Once, the forward ai and backward /3 r variables are computed, 
we can solve the marginalization problem by use of the 
formula 

r 

Vl:r{si: r ) = Oil{si) ® Ui(Si-i, Si) ® /3 r (s r ). (50) 



The normalization problem can be solved with the forward 
pass only according to 



0u( S )=0a T (s T ). 



(51) 



In the following subsection we derive the FB algorithm 
for HMMs as a special case of the FB over the commutative 
semiring. 

B. HMM forward-backward as a special case of the forward- 
backward over the commutative semiring 

The conditional HMM probability p(s\o) can be seen as a 
special case of the global kernel factorization fiTt if © and © 
stand for the addition and multiplication of the real numbers. 
To clarify this, recall that join HMM probability ([TJ has the 
form 

T 

p(s,o) = Tr So b So (o )Y[a St _ 1 s t b St (o t ), (52) 
t=i 

and that according to the chain rule, conditional observational 
probability can be represented as 

T T 

P(oo-.t) = P(0q) ■ Y[p(o t \o 0:t -l) = C Y[ C *> ( 53 ) 

4=1 t=l 

where Co = p(oq) and ct — p(o t \oo-t-i) as in ( fTTT i. Then, 

(s o) T 
P( s \°) = — tV" = z (s )T[z t (st-i,st), (54) 
Pip) l\ 

where 

/ \ 7r So 6 So (o ) a St _ lSt b St (ot) 

zo{so) = , zt(si_i,s t ) = . (55) 

Co c t 

According to the equation ©, the subsequence condi- 
tional probabilities can be represented as p(so-i\oo-i) = 
z o( s o) n*=i z t(st-i, St), and the forward variable (156b has 
the form 

i 

a%{si) = ^ z o(so) Y\_Zt(st-i,s t ) =p(si\o ;i), (56) 

SOst-l i=t 

in agreement with ©. The recursive equations d46l i. d43T > for 
the forward variable have the form 



«o(so) = z (s ) 



7TsoM°o) 



CO 



(57) 



a t{s t ) = ^2 z t( s *-i' s t) ' a t-i(st-i) = 

st-l 

E st _! a St _ lSt 6 St (o t )a t _i(s t _i) 
c* 



(58) 



and the normalization factors can be computed using the 
probability condition 

2°*( a *) = 2p(**|oo:t) = 1 5 (59) 

which gives 

AT jV iV 

c = XI ^'^(oo) c t = a t-i(*)°iA'( t)- ( 6 °) 
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and the definition of forward variable and its recursive are given with: 
equations agrees with the equations from section [TT] Simi- T 

larly, according to the equation ©, p(s i+1:T , o i+1:T \s t ) = u{s) (z) = Zq ( Sq ) FT ^(^^ s .) (73 ) 

Ilt=i+i z t{st-ii St), the backward variable is 



T 



AW=J n*M = #^> (6i) «M w =*(-o)II*(««-i.«)- (74) 

T 

with the recursive equation: ^ o(so) + ^ ^. _ (?5) 
#r(*r) = 1, (62) 



, s £ s a St * t+ iMo*H-i(st-i) Note that 

/3t(«t)= + -• (63) T T 

, , . , hn(sn)+y hj(Sj^i, Sj) — log (zn ■ \ \ Zj(si-i, Si)) , (76) 

Finally, in the same manner, the equation (go) reduces to ([53 y J ~[ jj{ 

p(si o) = ai(si) ■ TT ast - 1 ' St ^ St (° t ^ . /3 ( s ) (64) an( ^' according to the factorization d54"b for //MM conditional 
{— i+i Ct probability p(s\o) — zq ■ Yli—i z d s i-ii s i), we can represent 

the global kernel as follows 

which retrives the HMM forward-backward algorithm. 

u ( s ) = ( P(«|o) , p{s\o) log p(s\o) ). (77) 

V. The forward-backward algorithm over the 

ENTROPY SEMIRING Hence, by summing of the global kernel we can obtain the 

entropies H(S\o) or H (S-i- r , s;- r |o) as the h part of the 
In this section we consider the forward-backward algorithm ,. , , , , , ,[ . , . , 

° sum, which depends on the set of the variables which are 

over the entropy semiring (ESRFB) and its application to HMM , . _ „ , . 

rj . f v . . , . , . summed out. Two types of the summation correspond to the 

entropy computation. The entropy semiring (ESR), which is v j • r ,, , , , , , , • , 

^ J r rj ° normalization and marginalization of the global kernel which 

introduced in [31 and 5 , is defined as follows. , , , ... ,, „ , , , , . ... 

can be solved with the forward-backward algorithm over the 

Definition 2: The entropy semiring is a the commutative 
. . . . • , Tr m2 j i_ • • • entropy semiring, 

semiring for which K = Mr and the semiring operations are , , „ , „ , , , , , • , , 

d fi d th Z an P arts °* forward and backward variables 

in the entropy semiring can also be derived using Lemma Q] 
(zi,hi) © (22,^2) = (zi+ z 2> hi + h 2 ), (65) For the forward vector, 
(zi,hi) ® (z 2 ,h 2 ) = (ziz 2 , z 1 h 2 +z 2 hi), (66) * 

r 11/ ^ / u\f p2 u ^ ,v f m a(( s t)= ® «o(so)®0« l (s,-i,s,) (78) 

for all (zi, fti), (z 2 ,h 2 ) from K . The identities for and ® 

SO:t — 1 4=1 

are (0,0) and (1,0), respectively. 

The first component of an ordered pair is called a z-part, we nave 
while the second one is an /i-part. The following lemma can * 

be proven by the induction (see Q). a \ Z \ s t) — ^2 z o( s o)-JJ_Zi(si-i,Si) (79) 

Lemma 1: Let (zj, z-i/ij) £ 7?. 2 for all < i < T. Then, the »ost-i i=i 

following equality holds: * 

= 52 2o(so)-TTzi(s i _i,s i )- (80) 

' ' S0:t-1 1=1 



(z i) i! i fti)= I JJzj , Y[ z i -^2 h J )• (67) 



t=0 N t=0 i=0 j=0 



(h (s ) + J2 h i( s j-i> s j))> ( 81 ) 
Let the local kernels in d4lT > have the form: j=i 

u (s ) = (z a (s ),z Q (s a )ho(s )); (68) and by use of the equality p(s :t\oo:t) = 

= (zM-us^Ziisi-uSiMsi-uSi)), (69) DWlti^'i-i.'*). we obtain 

where °H \ s t) = X! P( s 0:t|°0:t) = p(st|oo:t), (82) 

7r So 6 So (o ) , . _ a St _ lSt 6 St (o 4 ) 

Co c t 
with c = p(o ), Ct = p(o t \o :t-i) and 

The z-part of the ESR forward vector is the HMM forward 

h (s ) =logz (so), ht(s t -i,s t ) = log z t (s t -i, s t ). (71) probab n ity as defined in the sections [TT1 and HV-Bl while 

From Lemma 1, it follows that the z and h parts of the global the information about subsequence entropies is propagated 

kgj-jjgj through the /i-part, so that at each step we have 

T 

tt(*)=«0(*0)<»g)u i (« i _i,S i ), (72) H(S :t\o :t) = J2 a t h) ( S ty (84 ^ 



^o( S o) = ; , z t (s t ^,s t ) = . (70) ai h) (s t )=J2pMoo: t )logp(so..t\o ..t). (83) 
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The forward vector is initialized to Mo(so) an d regarding 
we have: 



(*), s / \ K So b So (o ) 
a o ( s o) = ^o(so) = 



CO 



(85) 



a o (yo) = zo(so)/io(so) 



7r So & So (oo) , Tr So b So (o Q ) 
= log . (86) 

Co c 

The z and h forward recursive equation 

Cti{Si) = Uj_i(Si_i, S<) (8) Q!i_i(Si_i), (87) 

Si-l 

can be determined using the definition of the entropy semiring 
as 



„C0 



( s i) = ^ Si)Q! t --l 

( s ») =E z »( s i-i' s ») • 



(88) 



Si_l 

(aS^-iJ + ^ls-b^JaSts,-!)), (89) 



or equivalently 



JV 



c t 



(90) 



• h) 0') = V a ij h A°t) 



(aW(i) + aW(i)log^M). (91) 

Cf 



Similarly as in sections [TT] and IIV-BI factors c t can be found 
by normalization of z-parts: 



N N 



c o = E 7r J 6 3'(°o) Ct = EE a t-i(*)°iA'(°t)- ( 92 > 
The backward vector 

T 

ft(si) =00 u *( s *-i> s *), (93) 

«i + l:T t = i+l 

has corresponding z and h parts 



/3* (z) (s t )= e n *<(*i-i,*i) 

St + 1:T i=t + l 

T T 

& ot)= e n z i( s i-i, s i) ■ e h j( s j-i' s J 

St + 1:T i=t + l j=t + l 

The equality ^ implies 



(94) 



(95) 



n p{s t+1:T ,O t+ l:T\st) 
Zi{si-i,Si)= ; r — (96) 

P{Ot+l:T\O0:t) 



and we have 

R (z) ( . _ p(Ot+l:T\st) 

Pt ( s t) — —, i r 

P{Ot+l:T\OQ:t) 



A h \st) 



E 

S t+1:T 



P(st+l:T,O t+ u T \s t ) p(s t+ u T ,O t +UT\st) 



p(ot+l:T\oo-.t) 



l0g : 



p(0t+l:T\0Q:t) 



(97) 



(98) 



which gives the z-part of the ESR backward vector, the same 
as HMM backward probability from the sections [TT] and IIV-BI 
The backward vector is initialized according to (3t{st) = 1: 

/3 t (st) {z) = I, t3 T {s T ) {h) = 0. (99) 
while the recursive equation 

Pt(st) = ut+i(st, s t +i) <S> /3 t+ i(s t+ i), (100) 



St+1 



reduces to 



(101) 



Si+l 



ft ( s i) = E Z ( S '» S '+ 1 ) ' 

(^(Si+i) + (102) 



or equivalently 



ft (z) w = E 



Ct+1 



(103) 



ft w w = E 



Ct+l 



CO , c,-M^ a y & i(°t) 



(^OO + ^OOlog 



Ct+l 



(104) 



where the normalization constants Ct are computed in the 
forward pass. 

A. HMM entropy computation using ESRFB 

If the summation of the global kernel d77b is performed over 
the whole sequence 

M (s) = ( P( 8 \°) > p(«|o)bgp(«|o) ). (105) 



the z and h parts of the sum reduce to 

U ( S )) (z) = E^ s i°) = 1 < 



(106) 



CO 



©«(*)) -Ep( s i°) lo g^ s i o ) = - F ( s i°) < 107 ) 



The ft, part of the sum corresponds to the HMM entropy and 
it can be found as a solution of the normalization problem 



O) 



0«oo =E4 ft) w 



(108) 



i=t+l 



using only the forward pass, according to equations 
(|90j-(l92j, as follows. 



7 



I ) Initialization: For j = 1, . . . , N set: 

N 



C"0 



Kjbj{o Q ) irjbj(o ) 
« Hj) = log^-^- 
co c 



loe 



(109) 
(HO) 



2) Induction: For 1 < t < T, 1 < j < N compute 

N N 

ct = Y,T, a t'i^ b j(°t) (in) 



a 



3 = 1 i=l 



«ii'i(<) 



(112) 



i=l 



'(i) = E' 



i=l 



(aW( i) + aa W log^M; 



(113) 



3) Termination: Terminate algorithm with summations: 

AT 

H(S| ) = -5>£°(j), (114) 

The algorithm runs in 0(N 2 T) time using 0(iV) 

space as in Hernando et al.'s algorithm. Moreover, both 
algorithms recursively compute the forward probability 

a { z \s t ) =p(s t \o 0:t ). (115) 

The difference in two algorithms is in the second quantity 
which is computed - in Hernando et al.'s algorithm it is the 
intermediate entropy 

H t {s t ) = H(So:t-i\st,o 1:t ) = 

- X! M S 0:t-l|st,Ol: t )logp(s0:t-l|st,Ol :t ), (116) 

SO:t-l 

while in the ESRFB it is the /i-part of the forward vector: 

a t( S t)= ^2 P(s0:t\o :t) log p(s 0: t\o 0: t) (117) 
So:t-l 

The relation between the quantities 



W ( z ) ZJ I \ I ( z ) 1 ( z ) 

a\ -a\ 'H t (s t )+a y t ; logc4 



(118) 



can easily be derived by use of the elementary probability 
transformations. 

Furthermore, from HMM joint probability factorization (fTJ 
we can derive the Markov properties 



p(ot\st,s t -i,oo:t-i) =p(ot\st) 

p(st\st-l,O :t-l) =p(s t \s t -l), 

which imply the following equalities: 

a St - lSt b St (ot) _ p(s t \s t -i)p(ot\st) _ 
ct p{ot\oo:t-i) 

( z) 

K°t! s tl s t-i'°0:t-i) Pt-i\t{st-i\s t ) ■ a t (s t ) 



(119) 
(120) 



p(o t \o 0: t-i) 



a^i\(s t _i) 



(121) 



where p t -x\t(s t -i\s t ) = p(s t -i\s t ,o 0:t ) as defined in the 
Hernando et al.'s algorithm. Then, the recursive equations 
for H t (s t ) derived by Hernando et al. can also be obtained 
from the ESRFB algorithm by substituting ( 1121b and (II 18b in 
recursive equations for a\ in ESRFB algorithms, which give 
us the close relation between two algorithms. 

B. HMM subsequence constrained entropy computation using 
ESRFB 

If the summation of the global kernel is performed over a 
subsequence s_; :r 

u{s) = ( p(s\o) , p(s\o)logp(s\o) ). (122) 

»-l:r *-l:r 

the z and h parts of the sum are 

(0«W) <2 =p(si-.r), (123) 

S_(, r 

(0 U ( S )J =-H{S l .. r ,S l:r \o). (124) 

S-l:r 

The h part of the sum corresponds to the HMM subsequence 
constrained entropy and it can be found as a solution of the 
marginalization problem 

r 

Vl:r(si:r) = ai(si) ® (^) Itj (Sj_l , Sj) <g> j3 r (s r ). (125) 
i=l+l 

The z and h parts can be found using the definition for the 
entropy semiring operations: 



©«(' 



( z ) 



(2) 



(siWHsr) ]J[ Zi(Si-l,Si), (126) 



i=l+l 



CO 



u ( s )) = n z l (s l - u s l )- 

S-hT i = l + l 

(a { l z) ( Sl )ti h H Sr ) + a < l h} ( Sl )(3^( Sr )+ 

r 

a[%)/3^( Sr ) Y, h 3 is 3 -i,s 3 )) (127) 
3=1+1 

To compute the /i-part of the marginal, we need Z-th forward 
and r-th backward vectors. The Z-th forward vector can be 
computed by ESR forward algorithm using recursive equations 
( I109t - (I113I >. However, the recursive steps d 1 33b - d 113b for the 
normalization constants c t and the z part of forward vectors 
should be performed for all t, because the normalization 
constants ct,r < t < T should be available in the backward 
pass. Once the normalization constants are computed, the 
backward pass can be performed according to the equations 
(1991 . ( I103b -( ll04b . and, after that, we can compute the sub- 
sequence constrained entropy using the equalities ( 1 1 26b -< fT~3~9b 
and ( fT8l . The algorithm follows. 

I) Forward initialization: For j = 1, . . . , N set: 



JY 



CO 



= E 7r A(°o) a i ) CO = 



Co 



"o 0) = - log 

Co c 



(128) 
(129) 



s 



2) Full forward recursion: For l<t<l, l<j<N 
compute 

N N 
3 = 1 z=l 

«t C?) = 2^ a t-iW ( 131 ) 



i=l 
JV 



Q 



lft) (i) = V aijbj (°^ 



(am + ^mog^^). (132) 

3) Forward z-part recursion: For £ + 1 < i < T, 1 < j < 
N compute 



N N 



3 = 1 1=1 
N 



4 z) o-) = E 



a ij b 3(°t) (z) 



<Jiii). 



4) Backward initialization: For j = 1, . . . , N set: 



4 z) (i) = i, /?r(j) = o 



(133) 



(134) 



(135) 



5) Backward recursion: For T — 1 > t > r, 1 < j < N 
compute 

^ ) «=E £ ^a ( : ) iO-) d36) 



Ct+l 

a »A(°t) 
Ct+l 



(^SCj) + /3Si(j)log 



Ct+l 



(137) 



6) Termination: For £ < i < r, 1 < s t < N, compute the 
subsequence constrained entropy: 



P 



a St _ 1St b St (o t ) 



t=i+i 



Cl 



(138) 



H(S l:r ,s l:r \o)= H 



a St _ lSt b St (o t ) 



Cl 



t=i+i 

(a\ z \ Sl )(3i h H Sr ) + a{ h) ( Sl )(3l z Hsr)+ 

r 

a\ z \ Sl )fc)( Sr ) £ hjlaj-usj)) (139) 

j=l+l 

tt( Q I ^ HjS-lvr, Sl.r\°) + lQgp(si:r|o) 
i?(6_ i:r |Si :r ,o) = — (140) 

The time complexity of the algorithm is 0(N 2 T + N r ~ l ), 
where 0(N 2 T) is for the forward-backward recursion, and 
0(N r ~ l ) for the termination phase, which is the same time 
complexity as in Mann-MacCallum's algorithm. 

On the other hand, full forward recursion phase can be re- 
alized in 0(N 2 l) time and in fixed size memory O(N), since 



c4*\> a t-\ an( l c *-i can b e deleted after having been used for 



the computation of a 



( 2 ) J h ) 



and Cf. Similarly, the forward z- 



part recursion and backward pass requires 0(N) space. Only 
additional space depending on the sequence length 0(T — I) 
should be available for normalization constants in the forward 
z-part recursion phase, since they should be available in the 
backward and termination phases. Finally, regarding 0{N r ~ l ) 
space required for storing the results in the termination phase, 
the total memory complexity is 0(T — I + N r ~ l ), which 
slightly increases with T then 0(NT + N r ~ l ), as required 
by Mann-MacCallum's algorithm. 

VI. Conclusion 

This paper proposes a new algorithm for memory effi- 
cient computation of the HMM entropy and subsequence 
constrained entropy when the observation sequence is given. 
The algorithm is called Entropy Semiring Forward-backward 
(ESRFB) since it is based on forward-backward recursion over 
the entropy semiring in the same manner as in our previous 
paper Q. 

ESRFB has the same time complexity as a previously de- 
veloped algorithm for subsequence constrained HMM entropy 
computation developed by Mann and MacCallum [9|, but with 
lower memory requirements. It is also applicable to state 
sequence entropy computation running with the same time 
and memory complexity as the recursive algorithm proposed 
by Hernando et al. 0. In addition, we have shown how 
the recursive equations in Hernando et al.'s algorithm can be 
derived from the ESRFB recursive equations. 
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